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We propose and solve exactly a model of a network that has both a tunable degree distribution and 
a tunable clustering coefficient. Among other things, our results indicate that increased clustering 
leads to a decrease in the size of the giant component of the network. We also study SIR-type 
epidemic processes within the model and find that clustering decreases the size of epidemics, but 
also decreases the epidemic threshold, making it easier for diseases to spread. In addition, clustering 
causes epidemics to saturate sooner, meaning that they infect a near-maximal fraction of the network 
for quite low transmission rates. 



I. INTRODUCTION 

There has in recent years been considerable interest 
within the physics community in the structure and dy- 
namics of networks, with applications to the Internet, the 
World-Wide Web, citation networks, and social and bi- 
ological networks 0, 0, Q ■ Two significant properties of 
networks have been particularly highlighted. First, one 
observes for most networks that the degree distribution is 
highly non-Poissonian 0, 0, IS 01 ■ (A network consists 
of a set of nodes or "vertices" , joined by lines or "edges" , 
and the degree of a vertex is the number of edges attached 
to that vertex.) Histograms of vertex degree for many 
networks show a power-law form with exponent typically 
between —2 and —3, while other networks may have ex- 
ponential or truncated power-law distributions. Second, 
it is found that most networks have a high degree of tran- 
sitivity or clustering, i.e., that there is a high probability 
that "the friend of my friend is also my friend" |9|. In 
topological terms, this means that there is a heightened 
density of loops of length three in the network, and more 
generally it is found that networks have a heightened 
density of short loops of various lengths ^(| . 

It is now well understood how to calculate the proper- 
ties of networks with arbitrary degree distributions 0, 
Il2l IT3I Hil flEf , but where clustering is concerned our un- 
derstanding is much poorer. Most of the standard tech- 
niques used to solve network models break down when 
clustering is introduced, obliging researchers to turn to 
numerical methods |9l fla. [l7L fl^ | . 

In this paper, we present a plausible network model 
that incorporates both non-Poisson degree distributions 
and non-trivial clustering, and which is exactly solvable 
for many of its properties, including component sizes, 
percolation threshold, and clustering coefficient. Our re- 
sults show that clustering can have a substantial effect 
on the large-scale structure of networks, and produces 
behaviors that are both quantitatively and qualitatively 
different from the simple non-clustered case. 

The outline of the paper is as follows. In Sec. ITT1 we 
define our model and in Sec. IIIII we derive exact expres- 
sions for a variety of its properties. In Sec. |^ we discuss 
the form of these expressions for some sensible choices 
of the parameters, and also consider the behavior of epi- 



demic processes within our model. In Sec. we give our 
conclusions. 



II. THE MODEL 

There is empirical evidence that clustering in networks 
arises because the vertices are divided into groups 0, 
l2(il ] , with a high density of edges between members of the 
same group, and hence a high density of triangles, even 
though the density of edges in the network as a whole 
may be low. Our model is perhaps the simplest and most 
obvious realization of this idea. We describe it here in the 
anthropomorphic language of social networks, although 
our arguments apply equally to non-social networks. 

We consider a network of N individuals divided into 
M groups. A social network, for example, might be di- 
vided up according to the location, interests, occupation, 
and so forth of its members. (Many networks are indeed 
known to be divided into such groups HJ.) Individu- 
als can belong to more than one group, the groups they 
belong to being chosen — in our model — at random. In- 
dividuals are not necessarily acquainted with all other 
members of their groups. If two individuals belong to 
the same group then there is a probability p that they are 
acquainted and q = 1 — p that they are not; if they have 
no groups in common then they are not acquainted. (A 
more sophisticated model in which there are many nested 
levels of groups within groups and a spectrum of acquain- 
tance probabilities depending on these levels has been 
proposed and studied numerically by Watts et al. [22| . 
For this paper, however, we confine ourselves to the sim- 
pler case.) In addition to the probability p, the model is 
parametrized by two probability distributions: r m is the 
probability that an individual belongs to m groups and 
s n is the probability that a group contains n individuals. 

Mathematically, the model can be regarded as a bond 
percolation model on the one-mode projection of a bi- 
partite random graph. The structure of individuals and 
groups forms the bipartite graph, the network of shared 
groups is the projection of that graph onto the individuals 
alone, and the probability p that one of the possible con- 
tacts in this projection is actually realized corresponds to 
a bond percolation process on the projection. See Fig.^ 
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FIG. 1: The structure of the network model described in this 
paper, (a) We represent individuals (A-K) and the groups 
(1-4) to which they belong with a bipartite graph structure. 

(b) The bipartite graph is projected onto the individuals only. 

(c) The connections between individuals are chosen by bond 
percolation on this projection with bond occupation proba- 
bility p. The net result is that individuals have probability p 
of knowing others with whom they share a group. 



III. ANALYTIC DEVELOPMENTS 

We can derive a variety of exact results for our model in 
the limit of large size using generating function methods. 
There are four fundamental generating functions that we 
will use: 
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A. Degree distribution 



Consider a randomly chosen person A, who belongs to 
some number of groups m. The number j of A's acquain- 
tances within one particular group of size n is binomially 
distributed according to (™T )p J 'q ,n-1-J \ We represent 
this distribution by its generating function: 
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Averaging over group size, the full generating function 
for neighbors in a single group is v X} n =o ns n\jP z + 
g]™ -1 = giipz+q), and for neighbors of a single person is 
fo(gi{pz + q)). This allows us to calculate the degree dis- 
tribution for any given {r m , s n }, and by judicious choice 
of the fundamental distributions, we can arrange for the 
degree distribution to take a wide variety of forms. We 
give some examples shortly. The mean degree (k) of an 
individual in the network is given by 

(*) = [fl*/o(ffi(p* + ?))]^ 1 =Pwi(l). (4) 



B. Clustering coefficient 

The clustering coefficient C is a measure of the level of 
clustering in a network || . It is defined as the mean prob- 
ability that two vertices in a network are connected, given 
that they share a common network neighbor. Mathemat- 
ically it can be written as three times the ratio of the 
number of triangles N/\ in the network to the number of 
connected triples of vertices N 3 Q . In the present case, 
we have 
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and hence the clustering coefficient is 
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where Ct is the clustering coefficient of the simple one- 
mode projection of the bipartite graph, Fig. [T^j . In 
other words, one can interpolate smoothly and linearly 
from C = to the maximum possible value for this type 
of graph, simply by varying p. (In the limit C = our 
model becomes equivalent to the standard unclustered 
random graphs studied previously 0,0.) The average 
number of groups to which people belong and the pa- 
rameter p give us two independent parameters that we 
can vary to allow us to change C while keeping the mean 
degree (fc) constant. Alternatively, and perhaps more 
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+ 344847947664 p 30 g 15 + 166867565040 p 31 g 14 + 73005619995 p 32 g 13 + 28759950345 p 33 q 12 + 10150589610 p 34 g n 
+ 3190186926 p 35 g 10 + 886163125 p 36 g 9 + 215553195 p 37 g 8 + 45379620 p 38 g 7 + 8145060 p 39 g 6 + 1221759 p 40 g 5 
+ 148995p 41 g 4 + 14190p 42 g 3 + 990p 43 g 2 + 45p 44 g + p 45 

TABLE I: The polynomials P(k\k) for values of k up to 10. 



logically, we can regard C and (k) as the denning param- 
eters for the model and calculate the appropriate values 
of other quantities from these. 

The local clustering coefficient C\ for a vertex i has also 
been the subject of recent study. Ci is defined to be the 
fraction of pairs of neighbors of i that are neighbors also 
of each other [|J. For a variety of real- world networks 
Ci is found to fall off with the degree fcj of the vertex as 
Ci ~ k~ x [jji|2(j. This behavior is reproduced nicely by 
our model. Vertices with higher degree belong to more 
groups in proportion to fcj while the number of pairs of 
their neighbors is ^ki(ki — 1), and the combination gives 
precisely d ~ k~ x as ki becomes large. 



C. Component structure 

To solve for the component structure of the model we 
focus on acquaintance patterns within a single group. 
Suppose person A belongs to a group of n people. We 
would like to know how many individuals within that 
group A is connected to, either directly (via a single edge) 
or indirectly (via any path through other members of the 
group). Let P(k\n) be the probability that vertex A be- 
longs to a connected cluster of k vertices in the group, 
including itself. We have 

P(k\n)=( n k l 1 \ k ^P(k\k), (7) 



which follows since we can make an appropriate graph of 
n labeled vertices by taking a graph of k vertices, to all of 
which A is connected, and adding n— k others to it, which 
we can do in ClZi) distinct ways, each with probability 
qk(n-k) probability that none of the newly added 
vertices connects to any of the k old vertices). 

The probabilities P(k\k) are polynomials in p of order 
s = jk(k — \) that can be written in the form 

s 

P{k\k) = Y. M iP l ^ l > ( g ) 

where M ; fc is the number of labeled connected graphs 
with k vertices and I edges. While some progress can 
be made in evaluating the by analytic methods (see 
Appendix A) , the resulting expressions are poorly suited 
to mechanical enumeration of P(k\k). For practical pur- 
poses, it is simpler to observe that 

fc-i 

P(fc|fc) = l-^P(/|fc), (9) 

which in combination with Eq. (J7J allows us to evaluate 
P(k\k) iteratively, given the initial condition P(l|l) = 1. 
In Table H] we give the first few P(k\k) for k up to 10. 

The generating function for the number of vertices to 
which A is connected, by virtue of belonging to this group 
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of size n, is: 
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Notice the appearance of z fc_1 — this is a generating func- 
tion for the number of vertices A is connected to exclud- 
ing itself. Averaging over the size distribution of groups 
then gives h(z) = v^ 1 J3 n ns n h n (z), and the total num- 
ber of others to whom A is connected via all the groups 
they belong to is generated by Gq(z) — fo(h(z)), where 
fo(z) is defined in Eq. Q). If we reach an individual 
by following a randomly chosen edge, then we are more 
likely to arrive at individuals who belong to a large num- 
ber of groups. This means that the distribution of other 
groups to which such an individual belongs is generated 
by the function f%(z) in Eq. and the number of other 
individuals to which they are connected is generated by 
G 1 (z) = f 1 (h(zj). 

Armed with these results, we can now calculate a vari- 
ety of quantities for our model. We focus on two in partic- 
ular, the position of the percolation threshold and the size 
of the giant component. The distribution of the number 
of individuals one step away from person A is generated 
by the function Gq(z), while the number two steps away 
is generated by Gq(G\ (z)). There is a giant component in 
the network if and only if the average number two steps 
away exceeds the average number one step away [l4|. 
(This is a natural criterion: it implies that the number of 
people reachable is increasing with distance.) Thus there 
is a giant component if [d z [Gq(Gi(z)) — Gq(z))\ 1 > 0. 
Substituting for Go and G\, this result can be written 



m)h'(i)>i. 



(ii) 



When this condition is satisfied and there is a giant 
component, we define u to be the probability that one of 
the individuals to whom A is connected is not a member 
of this giant component. A is also not a member provided 
all of its neighbors are not, so that u satisfies the self- 
consistency condition u — G\(u). Then the size of the 
giant component is given by S — 1 — Gq(u). 
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FIG. 2: Right panel: the size of the giant component of the 
graph as a function of clustering coefficient for the Poisson 
case with group size v = 10 and mean degree (k) = 5. Left 
panel: the size of an epidemic outbreak for an SIR model on 
our network as a function of transmissibility T, for vafues of 
C from to 0.6 in steps of 0.1. 



Poisson distribution corresponds to choosing the mem- 
bers of each group independently and uniformly at ran- 
dom. From Eqs. Q and © we have 



(k)=pn(v-l), C 



(12) 



In the right-hand panel of Fig. [21 we show results for 
the size of the giant component as a function of cluster- 
ing for the case of groups of size v = 10 with (k) = 5. 
As the figure shows, the giant component size decreases 
sharply as clustering is increased. The physical insight 
behind this result is that for given (fc), high clustering 
means that there are more edges in all components, in- 
cluding the giant component, than are strictly necessary 
to hold the component together — there are many redun- 
dant paths between vertices formed by the many short 
loops of edges. Since fixing (k) also fixes the total num- 
ber of edges, this means that the components must get 
smaller; the redundant edges are in a sense wasted, and 
the percolation properties of the network are similar to 
those for a network with fewer edges. 



IV. RESULTS 



A. Epidemics 



As an example of the application of these results, con- 
sider the simple version of our model in which all groups 
have the same size n = v. Then h(z) = h u (z) and the de- 
gree distribution is dictated solely by the distribution r m 
of the number of groups to which individuals belong. We 
consider two examples of this distribution, a Poisson dis- 
tribution and a power-law distribution. 

Let us look first at the Poisson case r m = /i m e _M /m!, 
for which the calculations are particularly simple. The 



A topic of particular interest in the recent literature 
has been the spread of disease over networks. The clas- 
sic SIR model of epidemic disease can be generalized 
to an arbitrary contact network, and maps onto a bond 
percolation model on that network with bond occupa- 
tion probability equal to the transmissibility T of the 
disease |24l Since we have already solved the bond 
percolation problem for our networks, we can also imme- 
diately solve the SIR model, by making the substitution 
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p — > pT. We show some results in the left-hand panel of 
Fig. |21 for the same choice of degree distributions as be- 
fore. In general we see a percolation transition at some 
value of T, which corresponds to the epidemic threshold 
for the model (denoted Rq = 1 in traditional mathemat- 
ical epidemiology) . Above this threshold there is a giant 
component whose size measures the number of people 
infected in an epidemic outbreak of the disease. 

The size of the epidemic tends to the size of the giant 
component for the network as a whole as T — > 1, as repre- 
sented by the dotted lines in the figure, and is therefore 
typically smaller the higher the value of the clustering 
coefficient. However, it is interesting to note also that 
as C becomes large the epidemic size saturates long be- 
fore T = 1, suggesting that in clustered networks epi- 
demics will reach most of the people who are reachable 
even for transmissibilities that are only slightly above the 
epidemic threshold. This behavior stands in sharp con- 
trast to the behavior of ordinary fully mixed epidemic 
models, or models on random graphs without cluste ring , 
for which epidemic size shows no such saturation p3 . l26| . 
It arises precisely because of the many redundant paths 
between individuals introduced by the clustering in the 
network, which provide many routes for transmission of 
the disease, making it likely that most individuals who 
can catch the disease will encounter it by one route or 
another, even for quite moderate values of T. 

As we can also see from Fig. |2 the position of the epi- 
demic threshold decreases with increasing clustering. At 
first this result appears counter-intuitive. The smaller 
giant component for higher values of C seems to indicate 
that the model finds it harder to percolate, and we might 
therefore expect the percolation threshold to be higher. 
In fact, however, the many redundant paths between ver- 
tices when clustering is high make it easier for the disease 
to spread, not harder, and so lower the position of the 
threshold. Thus clustering has both bad and good sides 
were the spread of disease is concerned. On the one hand 
clustering lowers the epidemic threshold for a disease and 
also allows the disease to saturate the population at quite 
low values of the transmissibility, but on the other hand 
the total number of people infected is decreased. 



see why this is, note that, according to the findings re- 
ported here, we would have to reduce clustering to in- 
crease the threshold above zero, but this is not possible 
starting from a random graph, which has C = to begin 
with [9|. (C is fundamentally a probability, and hence 
cannot take a negative value.) Mathematically, we can 
demonstrate that our network always percolates using 
Eq. . We can create a power-law degree distribution 
by making the distribution of number of groups an indi- 
vidual belongs to follow a power law r m ~ mT a . (If we 
wish, we can also make the distribution of group sizes 
follow a power law — it doesn't change the qualitative 
form of our results.) The bond occupation probability, 
and hence the transmissibility, enters Eq. l|llfl through 
the function h(z), but does not affect fi{z). We have 

= Em m ( m_ l) r ™ = ( m2 ) — ( m )- For a < 3, this 
diverges, and hence Eq. is always satisfied, regard- 
less of the value of p or T. 



V. CONCLUSIONS 

We have introduced a solvable model of a network with 
non-trivial clustering, and used it to demonstrate, for in- 
stance, that increasing the clustering of a network while 
keeping the mean degree constant decreases the size of 
the giant component. Increasing the clustering also de- 
creases the size of an epidemic for an epidemic process 
on the network, although it does so at the expense of 
decreasing the epidemic threshold too. Among other 
things, this means that no amount of clustering will pro- 
vide us with a non-zero epidemic threshold in networks 
with power-law degree distributions. 
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B. Power- law degree distributions 

Now consider the case of a power-law degree distribu- 
tion. Networks with power-law degree distributions occur 
in many different settings and have attracted much recent 
attention 0, 0, 0, H?} ■ Percolation processes on random 
graphs with power-law degree distributions notably al- 
ways have a giant com pon ent, no matter how small the 
percolation probability [28j . This means for example that 
a disease will always spread on such a network, regard- 
less of its transmissibility. This result can be modified 
by more complex network structure such as correlations 
between the degrees of adjacent vertices [2!j |3(J, but, 
as we now argue, it is not affected by clustering. To 



APPENDIX A: PROBABILITIES FOR 
CONNECTED GRAPHS 

Equation (JSJ implies that we can find a general expres- 
sion for P(k\k) if we can calculate the number of con- 
nected graphs with a given number of vertices and edges. 
The standard method for counting such graphs is to write 
down the exponential generating function for possibly 
disconnected graphs and perform an inverse exponential 
transform to give the so-called Riddell formula j3J : 

J2 Mf-y l = logfl + £(1 + y)^-l)/2±_\ (A1) 

kl ^ n— 1 ' ' 
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Putting y — > p/q, x — > x^Jq, and making use of 
Eq. JHJ , we then derive the following generating function 
for P(k\k): 

E^ 2/2 PWfe)|r = iog(E^ 3/2 ^)- (A2) 

fe=l ' ' V ra=0 ' ' 

The sum on the right-hand side is strongly divergent for 
\q\ < 1, but progress can be made by allowing q to take a 
non-physical value greater than 1 and then analytically 
continuing to the physical regime. Using the fact that 
the Gaussian is its own Fourier transform: 

e -* 2 /2 = J_ / e - z2 / 2 e izt dz, (A3) 

V27T J^oo 

the sum can be written |33| 




where we have interchanged the order of sum and inte- 
gral. 

Unfortunately, the integral cannot be carried out in 
closed form, and although some asymptotic results can be 
derived using saddle-point expansions, it does not appear 
at present that a closed-form solution for the generating 
function h n (z), Eq. i|l(J|) . can be simply derived. 
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